An illustration with
scanner and crowd-sourced
nutritional datasets
Insee (French official statistics administration)
4/21/23
Perspective
Understand what is the nature, nutritional or environmental quality of food products consumed help to develop a sustainable and healthy consumption
Applications emerged to help consumer know better available products:
Justification
Crowd-sourced database open up new perspectives on the analysis of scanner data at population scale once they have been matched.
Problematic
Enrichment of scanner data with several sources of information using advanced fuzzy matching methods
ElasticSearch) and embeddings to associate pairsRelevanc data)Tip
Open Food Facts)| Type of information | Exemples |
|---|---|
| Aggregated quality indices | Nutriscore, NOVA score, Ecoscore… |
| Nutritional information | Energy, carbohydrates, fat… |
| Product information | Packaging, volume… |
RelevanC)Open Food Facts)Wordclouds before preprocessing
Reduce noise in dataset ;
Harmonize different sources ;
Identify non-food products despite filtering categories.
RelevanC)Open Food Facts)Wordclouds after preprocessing
Objective
EAN found in Open Food FactsOpen Food Facts products sharing same COICOP (see our classification algorith)Open Food FactsCIQUAL & Wikipedia dictionaries (normalized named products)Important point
Ricard and PastisIdea
Tip
We have a way to learn to link scanner and crowdsourced labels !
| Scanner data | Open Food Facts |
|---|---|
| Beurre aux truffes | Beurre aux truffes |
| Ricard FA18 | Pastis de Marseille |
| Tartiflette William Saurin | Tartiflette au reblochon |
PyTorch:
FastText embedding)Casino |
Franprix |
Monoprix |
||||
|---|---|---|---|---|---|---|
| Linkage step | Products sold (%) | Revenue (%) | Products sold (%) | Revenue (%) | Products sold (%) | Revenue (%) |
| Products not found | 0.8 | 1.4 | 0.5 | 0.9 | 0.9 | 1.3 |
| EAN Matching (step 1) | 72.1 | 65.0 | 83.5 | 81.1 | 68.6 | 66.0 |
| Fuzzy matching with OpenFood, restrictive (step 2) | 24.6 | 29.3 | 12.9 | 14.6 | 26.9 | 27.5 |
| Fuzzy matching with OpenFood, less restrictive (step 3) | 1.7 | 2.8 | 2.0 | 2.1 | 2.7 | 3.6 |
| Fuzzy matching with CIQUAL (step 4) | 0.7 | 1.6 | 1.2 | 1.4 | 0.9 | 1.5 |
FastText classification model: justification (back to content)Tip
When performing linkage, blocking variable useful :
FastText classification model: exemples (back to content)| Initial label | Tokenized label | COICOP | Label |
|---|---|---|---|
| LE PANIER FAISSELLE BIO 4X100G | panier faisselle bio | 01.1.1.5.1.9999 | Bread and cereals |
| NAVARIN AGNEAU 1,2KE | navarin agneau | 01.1.2.8.3.0010 | Meat |
| SAUMON SAUVAGE PROV.MSC 330G BQ | saumon sauvage prov msc | 01.1.3.6.1.9999 | Fish and shellfish |
| ABRICOT 35/45 BQ 1KG | abricot | 01.1.6.3.1.0005 | Fruits |
| POTE AUVERGNATE 400G | pote auvergnate | 01.1.7.6.1.9999 | Vegetables |
| MIEL ROMARIN HAUTE VALLES 480G | miel romarin haute valles | 01.1.8.4.1.0016 | Confectionery and frozen products |
| ENTREMETS CITRON MERINGUE 6P 500G | entremets citron meringue | 01.1.8.5.1.9999 | Confectionery and frozen products |
| HERBE MENTHE POT | herbe menthe pot | 01.1.9.2.1.0017 | Salt, spices and sauces |
| COCA-COLA ZERO PET 1.5LX6 CONT MAST | cocacola zero pet cont mast | 01.2.2.2.1.0006 | Other soft drinks |
| BISTROT DE FRANCE RS BIB 5L | bistrot france rs bib | 02.1.2.1.1.0004 | Wines, ciders and champagne |
Python controls this polyglot pipeline (foodbowl 🍜 package):
ElasticSearch and S3 ;ElasticSearch requests ;ElasticSearch ;PyTorch trained word embedding)Perspective
Future work needed to make public our models (maybe using FastAPI)
RelevanC label |
Open Food Facts label |
Nutrients |
|
|---|---|---|---|
| Original Label | Preprocessed label | Preprocessed label | Energy (by 100g) |
| HERBE MENTHE POT | herbe menthe pot | menthe bio pot | 180 |
| MIEL ROMARIN HAUTE VALLES 480G | miel romarin haute valles | miel romarin | 336 |
| POTE AUVERGNATE 400G | pote auvergnate | truffade auvergnate | 682 |
| LE PANIER FAISSELLE BIO 4X100G | panier faisselle bio | panier | 389 |
| COCA-COLA ZERO PET 1.5LX6 CONT MAST | cocacola zero pet cont mast | cocacola pet | 126 |
| NAVARIN AGNEAU 1,2KE | navarin agneau | navarin petit legumes agneau francais | 197 |
| ABRICOT 35/45 BQ 1KG | abricot | abricot | 84 |
| BISTROT DE FRANCE RS BIB 5L | bistrot france rs bib | rs | 1540 |
| SAUMON SAUVAGE PROV.MSC 330G BQ | saumon sauvage prov msc | msc oeufs saumon sauvage | 866 |
| ENTREMETS CITRON MERINGUE 6P 500G | entremets citron meringue | cone citron meringue | 1142 |
Warning
Same product can be present with slightly different names in Open Food Facts. These duplicates could be admissible pairs. However, here they are considered as inadmissible as would be any other product.